Towards Building Error Resilient GPGPU Applications

نویسندگان

  • Bo Fang
  • Jiesheng Wei
  • Karthik Pattabiraman
  • Matei Ripeanu
چکیده

GPUs (Graphics Processing Units) have gained wide adoption as accelerators for general purpose computing. They are widely used in error-sensitive applications, i.e. General Purpose GPU (GPGPU) applications However, the reliability implications of using GPUs are unclear. This paper presents a fault injection study to investigate the end-to-end reliability characteristics of GPGPU applications. The investigation showed that 8% to 40% of the faults result in Silent Data Corruption (SDC). To reduce the percentage of SDCs, we propose heuristics to selectively protect specific elements of the application and design fault detectors based on heuristics. We evaluate the efficacy of the detectors in reducing SDCs and measure performance overheads of the detectors. Our results show that the heuristics are able to reduce the SDC causing faults by 60% on average, while incurring reasonable performance overheads (35% to 95%).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Error Resilience of GPGPU Applications

Over the past years, GPUs (Graphics Processing Units) have gained wide adoption as accelerators for general purpose computing. A number of studies [1, 2] have shown that significant performance gains can be achieved by deploying GPUs on traditional high performance computing (HPC) systems that host demanding scientific applications. However, the reliability implications of using GPUs are unclea...

متن کامل

Error Resilience Evaluation on GPGPU Applications

While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, ...

متن کامل

Towards Multi-tenant GPGPU: Event-driven Programming Model for System-wide Scheduling on Shared GPUs

Graphics processing units (GPUs) are attractive to the generalpurpose computing (GPGPU) beyond the graphics purpose. Sharing GPUs among such GPGPU applications is a key requirement especially for cloud platforms whose resources are utilized by various cloud users. However, consolidating recent GPU applications, referred to as GPU eaters, on a GPU poses a new challenge. Such advanced application...

متن کامل

Soft Error Resilient QR Factorization for Hybrid System

As the general purpose graphics processing units (GPGPU) are increasingly deployed for scientific computing for its raw performance advantages compared to CPUs, the fault tolerance issue has started to become more of a concern than before when they were exclusively used for graphics applications. The pairing of GPUs with CPUs to form a hybrid computing systems for better flexibility and perform...

متن کامل

Fault injection on GPGPU application

Today, with the development of GPU computing techniques in terms of architectures and hardware and software support, people realized that intensive computing workload could be ported to GPU device. Applications could exploit GPUs’ characteristics for parallel computing and gain a significantly high speedup comparing to CPU architecture. However, failures are still unavoidable. People have alrea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012